This use case aims to evaluate the sufficiency of on-street parking around Cafe, restaurant, bistro seats, Bar, tavern, pub in Melbourne. By analyzing the locations of car parking meters, parking bays, and business establishments, the City of Melbourne can ensure adequate parking facilities to support local businesses and enhance visitor convenience.
User Story
As a cafe owner, I want to know the availability of nearby parking spaces so that I can ensure my customers have convenient parking option.
As a restaurant manager, I want to know about the availability of nearby parking so that I can enhance the dining experience for my guests.
As a customer, I want to identify available parking spots so that I can easily visit the bistro without the frustration of a lengthy parking search.
As a council planner, I want to know the availability of nearby parking spaces around cafés so that we can ensure adequate parking infrastructure for the convenience of residents and visitors.
At the end of this use case you will:
-Learn to import data sets using API v2.1
-Learn data manipulations and merging datasets
-Learn data visualisation using matplotlib, seaborn
-Learn to work with categorical locations and geolocations using geopy
-Learn to plot Maps using folium
-Develop Scoring model and catogerise the parking availability based on the score calculation
-Calculate the distance between each business location and the parking spots and identify the closest parking locations for given venue br>
-Develop Regression models and simple FFNN and evaluate them
Data Sets Used:
Data Set 1 Cafes and Restaurants with Seating Capacity.
This data set contains Block ID, Property ID, Base property ID, Building address, CLUE small area, Trading name, Business address, Industry code, Industry description, Seating type, Number of seats, Longitude, Latitude and Location. The data set was used to identify the locations of the cafes and restaurants with their seating capacities. The dataset is imported from Melbourne Open Data website, using API V2
Data Set 2 Bars and Pubs with Patron Capacity.
This data set contains Block ID, Property ID, Base property ID, Building address, CLUE small area, Trading name, Business address, Number of patrons, Longitude, Latitude and Location. The data set was used to identify the locations of the Bars, tavern and pubs with their patron capacities. The dataset is imported from Melbourne Open Data website, using API V2
Data Set 3 On Street Parking Bays.
This data set contains Roadsegment id, Kerbside id, Roadsegment Description, Latitude, Longitude and Last updated . The data set was used to identify the locations of the parking bays within the city of Melbourn. The dataset is imported from Melbourne Open Data website, using API V2
Data Set 4 On Street Car Parking Meters with Location.
This data set contains Barcode, Credit card, Tap and go, Longitude, Latitude, Location, Asset id, Meter id and Location Description. The data set was used to identify the locations of the parking meters within the City of Melbourne. The dataset is imported from Melbourne Open Data website, using API V2
Outline of the Use Case
- Data Preprocessing
I Started use case by cleaning and preparing each dataset for analysis. This involves:
Handling missing values and duplicates: Remove or impute missing values in latitude, longitude, and other critical fields. Geospatial alignment: Convert all datasets into a common format using latitude and longitude to enable spatial calculations. Data merging: Merge datasets based on proximity to business locations (cafes, restaurants, etc.).
- Data Visualization
-Interactive Maps: Used tools like Folium to create an interactive map showing the locations of cafes, restaurants, bars, and their proximity to parking meters and bays. -Bar Charts, Stack bar charts, Pie charts, multiple bar charts and other graphs and tables: To identify the key insights
- Feature Engineering
Next, I created features that will help the model understand the relationship between businesses and parking availability:
- Distance from parking to business:Computed the distance between each business and the nearest parking bay or meter.
- Number of nearby parking spaces: Counted the number of parking bays within a certain radius (e.g., 100 meters, 200 meters) around each business.
- Type of business: Included categorical variables for the Industry type of business (cafe, restaurant, bar, etc.) and their seating/patron capacity.
- Model Selection and Model Building
Geospatial Clustering Model (Density-based):
- DBSCAN: This clustering algorithm was used to group businesses based on the density of parking spaces around them. Areas with higher parking density would support more businesses, whereas areas with lower density might require parking interventions.
- K-means: Clustered businesses analyze the distribution of parking availability across different clusters.
Regression Model:
- Multiple Linear Regression/GLM: Predicted the sufficiency of parking based on features like business type, seating capacity, number of nearby parking bays/meters, and distance to parking.
- Logistic Regression: A binary model created to classify whether the parking is sufficient or insufficient based on the input features.
Spatial Autoregressive Model (SAR): A spatial autoregressive model was used if there's a spatial dependency in the data (e.g., businesses close to each other may share parking resources).
Random Forests/Gradient Boosting:
For more complex relationships, Random Forest or Gradient Boosting models used to predict parking sufficiency based on multiple features, including geospatial ones.
- Deep Learning Approach for Predicting Parking Accessibility with Custom Metrics (FFNN)
Model Evaluation Metrics
Evaluated the model using below metrics:
- Mean Absolute Error (MAE) / Mean Squared Error (MSE): For regression models to predict parking sufficiency.
- Clustering metrics: Evaluated the density-based clusters using silhouette score.
- Classification metrics: For logistic regression, use accuracy, precision, recall, and F1-score to assess the sufficiency of parking.
- Outputs
- Map of parking availability: Visualize parking availability by plotting parking bays/meters on a map and overlaying it with closest 5 business locations.
- Proximity analysis: For each business, details on the number of near by parking spots with different distance.
- Parking sufficiency Score and Catogerisation : Developed a scoring system to rank based on parking sufficiency score for each business
Importing Required Libraries¶
import requests
import pandas as pd
import numpy as np
from io import StringIO
import geopy
from geopy.geocoders import Nominatim
import folium
from ipywidgets import interact, widgets
from IPython.display import display
import seaborn as sns
import matplotlib.pyplot as plt
import re as re
from geopy.extra.rate_limiter import RateLimiter
import seaborn as sns
from geopy.distance import geodesic
from sklearn.cluster import KMeans,DBSCAN
from sklearn.preprocessing import MinMaxScaler,StandardScaler
import geopandas as gpd
from geopy.distance import great_circle
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc, silhouette_score
from sklearn.ensemble import RandomForestRegressor
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization, LeakyReLU
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import backend as K
from IPython.display import display, clear_output
Loading all Data sets¶
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='cafes-and-restaurants-with-seating-capacity'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
cafes_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(cafes_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
census_year block_id property_id base_property_id \
0 2017 6 578324 573333
1 2017 6 578324 573333
2 2017 11 103957 103957
3 2017 11 103957 103957
4 2017 11 103985 103985
5 2017 11 103987 103987
6 2017 11 103993 103993
7 2017 11 108971 108971
8 2017 11 559405 559405
9 2017 12 103955 103955
building_address clue_small_area \
0 2 Swanston Street MELBOURNE 3000 Melbourne (CBD)
1 2 Swanston Street MELBOURNE 3000 Melbourne (CBD)
2 517-537 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
3 517-537 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
4 562-564 Flinders Street MELBOURNE 3000 Melbourne (CBD)
5 550-554 Flinders Street MELBOURNE 3000 Melbourne (CBD)
6 508-514 Flinders Street MELBOURNE 3000 Melbourne (CBD)
7 561-585 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
8 547-555 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
9 475-485 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
trading_name \
0 Transport Hotel
1 Transport Hotel
2 Altius Coffee Brewers
3 Five & Dime Bagel
4 YHA Melbourne Central
5 Cafe Keyif
6 Waterside Hotel
7 Holiday Inn On Flinders Melbourne
8 Rex Tremendae
9 Masala Grill
business_address industry_anzsic4_code \
0 Tenancy 29, Ground , 2 Swanston Street MELBOUR... 4520
1 Tenancy 29, Ground , 2 Swanston Street MELBOUR... 4520
2 Shop , Ground , 517 Flinders Lane MELBOURNE 3000 4512
3 16 Flinders Lane MELBOURNE 3000 1174
4 562-564 Flinders Street MELBOURNE 3000 4400
5 Unit 1, 8 Flinders Street MELBOURNE 3000 4511
6 508-514 Flinders Street MELBOURNE 3000 4520
7 561-585 Flinders Lane MELBOURNE 3000 4400
8 Part 547 Flinders Lane MELBOURNE 3000 4512
9 Ground , 485 Flinders Lane MELBOURNE 3000 4511
industry_anzsic4_description seating_type \
0 Pubs, Taverns and Bars Seats - Indoor
1 Pubs, Taverns and Bars Seats - Outdoor
2 Takeaway Food Services Seats - Outdoor
3 Bakery Product Manufacturing (Non-factory based) Seats - Indoor
4 Accommodation Seats - Indoor
5 Cafes and Restaurants Seats - Indoor
6 Pubs, Taverns and Bars Seats - Indoor
7 Accommodation Seats - Indoor
8 Takeaway Food Services Seats - Outdoor
9 Cafes and Restaurants Seats - Indoor
number_of_seats longitude latitude \
0 230 144.969942 -37.817778
1 120 144.969942 -37.817778
2 4 144.956486 -37.819875
3 14 144.956486 -37.819875
4 43 144.955635 -37.820595
5 50 144.955969 -37.820399
6 250 144.957150 -37.820166
7 70 144.955104 -37.820200
8 14 144.955838 -37.820019
9 120 144.958100 -37.819368
location
0 -37.817777826050005, 144.96994164279243
1 -37.817777826050005, 144.96994164279243
2 -37.819875445799994, 144.95648638781466
3 -37.819875445799994, 144.95648638781466
4 -37.82059511593975, 144.9556348088
5 -37.82039904747718, 144.95596914650002
6 -37.82016599807918, 144.9571504476
7 -37.820200051650005, 144.95510422847929
8 -37.82001949059274, 144.95583815835
9 -37.81936815835749, 144.95810006565
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='bars-and-pubs-with-patron-capacity'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
bars_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(bars_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
census_year block_id property_id base_property_id \
0 2002 11 108972 108972
1 2002 14 103172 103172
2 2002 15 103944 103944
3 2002 16 103938 103938
4 2002 17 103925 103925
5 2002 33 105937 105937
6 2002 35 102143 102143
7 2002 37 102158 102158
8 2002 42 105392 105392
9 2002 43 108069 108069
building_address clue_small_area \
0 10-22 Spencer Street MELBOURNE 3000 Melbourne (CBD)
1 31-39 Elizabeth Street MELBOURNE 3000 Melbourne (CBD)
2 277-279 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
3 187 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
4 121-123 Flinders Lane MELBOURNE 3000 Melbourne (CBD)
5 435-443 Little Collins Street MELBOURNE 3000 Melbourne (CBD)
6 258-274 Collins Street MELBOURNE 3000 Melbourne (CBD)
7 86-88 Collins Street MELBOURNE 3000 Melbourne (CBD)
8 120-138 King Street MELBOURNE 3000 Melbourne (CBD)
9 131-141 Queen Street MELBOURNE 3000 Melbourne (CBD)
trading_name \
0 Explorers Inn
1 Connells Tavern
2 De Biers
3 Adelphi Hotel
4 Velour
5 The Kitchen Restaurant
6 Australia On Collins Bar
7 Dr Martin's Tavern
8 The Exchange Hotel
9 The Turf Bar & Restaurant
business_address number_of_patrons \
0 10-22 Spencer Street MELBOURNE 3000 50
1 35 Elizabeth Street MELBOURNE 3000 350
2 Unit 1, Basement , 277 Flinders Lane MELBOURNE... 400
3 187 Flinders Lane MELBOURNE 3000 80
4 Unit 1, Gnd & Bmt , 121 Flinders Lane MELBOURN... 350
5 LwrGround , 25 Little Collins Street MELBOURNE... 200
6 Shop 121, Basement , 260 Collins Street MELBOU... 11
7 86A Collins Street MELBOURNE 3000 291
8 Part Ground , 120 King Street MELBOURNE 3000 300
9 Unit 1, LwrGround , 131 Queen Street MELBOURNE... 400
longitude latitude location
0 144.955254 -37.820511 -37.82051068881513, 144.95525416628004
1 144.964322 -37.817426 -37.81742610667125, 144.964321660097
2 144.965307 -37.817242 -37.81724194023457, 144.96530699086
3 144.968385 -37.816360 -37.81635974400108, 144.9683846004515
4 144.970523 -37.815674 -37.815673646380745, 144.97052296371248
5 144.960107 -37.816397 -37.816397070511165, 144.96010676156988
6 144.964919 -37.815415 -37.81541503501324, 144.96491857864146
7 144.970594 -37.813938 -37.8139380932959, 144.97059378480083
8 144.956417 -37.817034 -37.81703382441968, 144.95641679014355
9 144.960543 -37.815497 -37.815496733626624, 144.96054302812774
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='on-street-parking-bays'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
parkingbays_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(parkingbays_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
roadsegmentid kerbsideid \
0 22730 NaN
1 22730 NaN
2 20013 5701
3 20013 23444
4 22268 NaN
5 22295 NaN
6 22295 NaN
7 22295 NaN
8 21108 NaN
9 20950 NaN
roadsegmentdescription latitude longitude \
0 Park Street between Mason Street and Randall P... -37.836245 144.982021
1 Park Street between Mason Street and Randall P... -37.835800 144.982115
2 Lonsdale Street between William Street and Kin... -37.814238 144.955451
3 Lonsdale Street between William Street and Kin... -37.814271 144.955334
4 Clowes Street between Anderson Street and Wals... -37.830568 144.984713
5 Anderson Street between Domain Road and Acland... -37.833607 144.983763
6 Anderson Street between Domain Road and Acland... -37.833657 144.983753
7 Anderson Street between Domain Road and Acland... -37.833817 144.983720
8 Courtney Street between Queensberry Street and... -37.803262 144.955425
9 Queensberry Street between Capel Street and Ho... -37.803515 144.954739
lastupdated
0 2022-08-31
1 2022-08-31
2 2023-10-02
3 2023-10-02
4 2022-08-31
5 2022-08-31
6 2022-08-31
7 2022-08-31
8 2022-08-31
9 2022-08-31
base_url='https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id='on-street-car-parking-meters-with-location'
url=f'{base_url}{dataset_id}/exports/csv'
params={'select':'*','limit':-1,'lang':'en','timezone':'UTC'}
response=requests.get(url,params=params)
if response.status_code==200:
url_content=response.content.decode('utf-8')
parkingmeters_df=pd.read_csv(StringIO(url_content),delimiter=';')
print(parkingmeters_df.head(10))
else:
print(f'Request failed with status code {response.status_code}')
barcode creditcard tapandgo longitude latitude \
0 MPM1629529 Yes Yes 144.953144 -37.781968
1 MPM1629539 Yes Yes 144.950263 -37.787154
2 MPM1629546 Yes Yes 144.952908 -37.781829
3 MPM1629570 Yes Yes 144.950592 -37.787432
4 MPM1629571 Yes Yes 144.950776 -37.781745
5 MPM1629572 Yes Yes 144.949931 -37.781986
6 MPM1629635 Yes Yes 144.954811 -37.783342
7 MPM1630378 Yes Yes 144.976562 -37.809093
8 MPM1630435 Yes Yes 144.952670 -37.785996
9 MPM1630436 Yes Yes 144.955421 -37.783839
location asset_id meter_id \
0 -37.7819682, 144.9531444 1629529 ZOOT8
1 -37.7871538, 144.9502628 1629539 ZOOT23A
2 -37.7818288, 144.9529081 1629546 ZOOT8A
3 -37.7874323, 144.9505921 1629570 ZOOT23
4 -37.7817446, 144.9507757 1629571 ZOOT2
5 -37.7819862, 144.9499307 1629572 ZOOT1
6 -37.7833424, 144.9548109 1629635 ZOOT12
7 -37.8090933, 144.9765624 1630378 871A
8 -37.785996, 144.9526703 1630435 ZOOT20
9 -37.7838385, 144.9554213 1630436 ZOOT13
location_description
0 Royal Park
1 Royal Park
2 Royal Park
3 Royal Park
4 Royal Park, approximately 73m NW of Leopard Lo...
5 Royal Park
6 Royal Park
7 Morrison Place between Victoria Parade and Alb...
8 Royal Park, approximately 84m SW of Melbourne ...
9 Royal Park
Cafes and restaurants with seating capacity.¶
I performed different data cleaning methods.
- Filter out past data and kept only recent data
- Droped location, industry_anzsic4_code, block_id, census_year, property_id , base_property_id, business_address, building_address columns and rename industry description column
cafes_df.head()
| census_year | block_id | property_id | base_property_id | building_address | clue_small_area | trading_name | business_address | industry_anzsic4_code | industry_anzsic4_description | seating_type | number_of_seats | longitude | latitude | location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017 | 6 | 578324 | 573333 | 2 Swanston Street MELBOURNE 3000 | Melbourne (CBD) | Transport Hotel | Tenancy 29, Ground , 2 Swanston Street MELBOUR... | 4520 | Pubs, Taverns and Bars | Seats - Indoor | 230 | 144.969942 | -37.817778 | -37.817777826050005, 144.96994164279243 |
| 1 | 2017 | 6 | 578324 | 573333 | 2 Swanston Street MELBOURNE 3000 | Melbourne (CBD) | Transport Hotel | Tenancy 29, Ground , 2 Swanston Street MELBOUR... | 4520 | Pubs, Taverns and Bars | Seats - Outdoor | 120 | 144.969942 | -37.817778 | -37.817777826050005, 144.96994164279243 |
| 2 | 2017 | 11 | 103957 | 103957 | 517-537 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Altius Coffee Brewers | Shop , Ground , 517 Flinders Lane MELBOURNE 3000 | 4512 | Takeaway Food Services | Seats - Outdoor | 4 | 144.956486 | -37.819875 | -37.819875445799994, 144.95648638781466 |
| 3 | 2017 | 11 | 103957 | 103957 | 517-537 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Five & Dime Bagel | 16 Flinders Lane MELBOURNE 3000 | 1174 | Bakery Product Manufacturing (Non-factory based) | Seats - Indoor | 14 | 144.956486 | -37.819875 | -37.819875445799994, 144.95648638781466 |
| 4 | 2017 | 11 | 103985 | 103985 | 562-564 Flinders Street MELBOURNE 3000 | Melbourne (CBD) | YHA Melbourne Central | 562-564 Flinders Street MELBOURNE 3000 | 4400 | Accommodation | Seats - Indoor | 43 | 144.955635 | -37.820595 | -37.82059511593975, 144.9556348088 |
cafes_df.shape
(60055, 15)
###Keep only the latest data
cafes_df['census_year'].max()
cafes_df = cafes_df[cafes_df['census_year'] == 2022 ]
cafes_df.nunique()
census_year 1 block_id 281 property_id 1272 base_property_id 1206 building_address 1272 clue_small_area 13 trading_name 1943 business_address 2043 industry_anzsic4_code 33 industry_anzsic4_description 33 seating_type 2 number_of_seats 194 longitude 1206 latitude 1206 location 1206 dtype: int64
# Drop 'location', 'industry_anzsic4_code' and 'census year'columns
cafes_df = cafes_df.drop(columns=['location', 'industry_anzsic4_code','block_id', 'census_year', 'property_id' , 'base_property_id','business_address','building_address' ])
# Rename 'industry_anzsic4_description' to 'industry_description'
cafes_df = cafes_df.rename(columns={'industry_anzsic4_description': 'industry_description'})
Bars and Pubs with Patron Capacity.¶
I performed different data cleaning methods.
- Filtered recent data
- Droped location, census_year, block_id, property_id, base_property_id ,business_address, building_address columns and rename Number of patrons column
bars_df.head()
| census_year | block_id | property_id | base_property_id | building_address | clue_small_area | trading_name | business_address | number_of_patrons | longitude | latitude | location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 | 11 | 108972 | 108972 | 10-22 Spencer Street MELBOURNE 3000 | Melbourne (CBD) | Explorers Inn | 10-22 Spencer Street MELBOURNE 3000 | 50 | 144.955254 | -37.820511 | -37.82051068881513, 144.95525416628004 |
| 1 | 2002 | 14 | 103172 | 103172 | 31-39 Elizabeth Street MELBOURNE 3000 | Melbourne (CBD) | Connells Tavern | 35 Elizabeth Street MELBOURNE 3000 | 350 | 144.964322 | -37.817426 | -37.81742610667125, 144.964321660097 |
| 2 | 2002 | 15 | 103944 | 103944 | 277-279 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | De Biers | Unit 1, Basement , 277 Flinders Lane MELBOURNE... | 400 | 144.965307 | -37.817242 | -37.81724194023457, 144.96530699086 |
| 3 | 2002 | 16 | 103938 | 103938 | 187 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Adelphi Hotel | 187 Flinders Lane MELBOURNE 3000 | 80 | 144.968385 | -37.816360 | -37.81635974400108, 144.9683846004515 |
| 4 | 2002 | 17 | 103925 | 103925 | 121-123 Flinders Lane MELBOURNE 3000 | Melbourne (CBD) | Velour | Unit 1, Gnd & Bmt , 121 Flinders Lane MELBOURN... | 350 | 144.970523 | -37.815674 | -37.815673646380745, 144.97052296371248 |
###Filter only the latest data
bars_df = bars_df[bars_df['census_year'] == 2022 ]
bars_df.shape
(281, 12)
bars_df.nunique()
census_year 1 block_id 119 property_id 242 base_property_id 234 building_address 242 clue_small_area 11 trading_name 281 business_address 280 number_of_patrons 126 longitude 234 latitude 234 location 234 dtype: int64
# Drop 'location' and 'Census year' columns
bars_df = bars_df.drop(columns=['location','census_year','block_id', 'property_id','base_property_id' ,'business_address','building_address'])
# Rename 'number_of_patrons' to 'number_of_seats'
bars_df = bars_df.rename(columns={'number_of_patrons': 'number_of_seats'})
#Add industry Description Column
bars_df['industry_description'] = 'Pubs, Taverns and Bars'
#Add Seating Type column
bars_df['seating_type'] = 'not specified'
Create a New Single Data Set containg all Cafes, restaurants, Bars, Taverns and Pubs with seating capacity.¶
I created a single data fream as business_df by merging both Cafes and Restaurants data set and Bars and Pubs data set.
- Checked and handled Null values for merged data set.
- Checked Industry wise contribution and grouped less frequent industies as one group and named as other industries.
- Checked for the duplicates values and removed duplicates
- Used LableEncoder to covert catogerical data to Numeric data.
# Find the property_ids that are already in cafes_df
existing_property_ids = cafes_df['longitude'].unique()
# Filter bars_df to include only rows with property_ids not in cafes_df
filtered_bars_df = bars_df[~bars_df['longitude'].isin(existing_property_ids)]
# Append the filtered rows to cafes_df
business_df = pd.concat([cafes_df, filtered_bars_df], ignore_index=True)
business_df.shape
(3079, 7)
business_df.nunique()
clue_small_area 13 trading_name 1991 industry_description 33 seating_type 3 number_of_seats 204 longitude 1251 latitude 1251 dtype: int64
###Checking Null Values
business_df.isna().sum() ###No Null Values
clue_small_area 0 trading_name 0 industry_description 0 seating_type 0 number_of_seats 0 longitude 0 latitude 0 dtype: int64
No Null values found in the data set.
LabelEncoder Catogerical Variables to Numerical variables
# Create a label encoder object
label_encoder = LabelEncoder()
# Apply label encoding to 'clue_small_area' and 'industry_description'
business_df['Area'] = label_encoder.fit_transform(business_df['clue_small_area'])
business_df['industry'] = label_encoder.fit_transform(business_df['industry_description'])
# Calculate the total count of all rows
total_count = business_df['industry_description'].value_counts().sum()
# Get the counts for each industry_description and convert to DataFrame
industry_counts = business_df['industry_description'].value_counts().reset_index()
# Rename the columns for clarity
industry_counts.columns = ['industry_description', 'Count']
# Add a new column for the percentage contribution
industry_counts['Contribution %'] = ((industry_counts['Count'] / total_count) * 100).round()
# Sort by count for clarity and display the result
industry_counts_sorted = industry_counts.sort_values(by='Count', ascending=False)
industry_counts_sorted
| industry_description | Count | Contribution % | |
|---|---|---|---|
| 0 | Cafes and Restaurants | 2295 | 75.0 |
| 1 | Takeaway Food Services | 395 | 13.0 |
| 2 | Pubs, Taverns and Bars | 231 | 8.0 |
| 3 | Accommodation | 62 | 2.0 |
| 4 | Bakery Product Manufacturing (Non-factory based) | 23 | 1.0 |
| 5 | Other Specialised Food Retailing | 11 | 0.0 |
| 6 | Catering Services | 9 | 0.0 |
| 7 | Clubs (Hospitality) | 8 | 0.0 |
| 8 | Sports and Physical Recreation Venues, Grounds... | 8 | 0.0 |
| 9 | Non-Residential Property Operators | 6 | 0.0 |
| 10 | Amusement and Other Recreational Activities n.... | 3 | 0.0 |
| 14 | Fruit and Vegetable Retailing | 2 | 0.0 |
| 15 | Flower Retailing | 2 | 0.0 |
| 16 | Mens Clothing Retailing | 2 | 0.0 |
| 13 | Performing Arts Operation | 2 | 0.0 |
| 11 | Supermarket and Grocery Stores | 2 | 0.0 |
| 12 | Performing Arts Venue Operation | 2 | 0.0 |
| 25 | Museum Operation | 1 | 0.0 |
| 31 | Zoological and Botanical Gardens Operation | 1 | 0.0 |
| 30 | Telecommunication Goods Wholesaling | 1 | 0.0 |
| 29 | Health and Fitness Centres and Gymnasia Operation | 1 | 0.0 |
| 28 | Motion Picture Exhibition | 1 | 0.0 |
| 27 | Car Retailing | 1 | 0.0 |
| 26 | Other Administrative Services n.e.c. | 1 | 0.0 |
| 23 | Horse and Dog Racing Administration and Track ... | 1 | 0.0 |
| 24 | Newspaper and Book Retailing | 1 | 0.0 |
| 22 | Other Gambling Activities | 1 | 0.0 |
| 21 | Other Food Product Manufacturing n.e.c. | 1 | 0.0 |
| 20 | Casino Operation | 1 | 0.0 |
| 19 | Hairdressing and Beauty Services | 1 | 0.0 |
| 18 | Other Interest Group Services n.e.c. | 1 | 0.0 |
| 17 | Religious Services | 1 | 0.0 |
| 32 | Other Store-Based Retailing n.e.c. | 1 | 0.0 |
Group less frequent industires to a one set as Other Industries
allowed_industry_descriptions = [
'Cafes and Restaurants', 'Takeaway Food Services', 'Pubs, Taverns and Bars', 'Accommodation',
'Bakery Product Manufacturing (Non-factory based)', 'Other Specialised Food Retailing',
'Catering Services', 'Clubs (Hospitality)', 'Sports and Physical Recreation Venues, Grounds and Facilities Operation'
]
# Replace descriptions not in the allowed list with "Other"
business_df['industry_description'] = business_df['industry_description'].apply(lambda x: x if x in allowed_industry_descriptions else 'Other')
Composition of Industry Description
# Calculate the total count of all rows
total_count = business_df['industry_description'].value_counts().sum()
# Get the counts for each industry_description and convert to DataFrame
industry_counts = business_df['industry_description'].value_counts().reset_index()
# Rename the columns for clarity
industry_counts.columns = ['industry_description', 'Count']
# Add a new column for the percentage contribution
industry_counts['Contribution %'] = ((industry_counts['Count'] / total_count) * 100).round(0).astype(int)
# Sort by count for clarity
industry_counts_sorted = industry_counts.sort_values(by='Count', ascending=False)
# Get the maximum value for the 'Count' column
max_count = industry_counts_sorted['Count'].max()
# Add a total row at the bottom
total_row = pd.DataFrame([['Total', industry_counts_sorted['Count'].sum(), 100]],
columns=['industry_description', 'Count', 'Contribution %'])
industry_counts_with_total = pd.concat([industry_counts_sorted, total_row], ignore_index=True)
# Define a function to highlight the row with the maximum 'Count'
def highlight_max_row(row):
if row['industry_description'] == 'Total':
return ['background-color: white' for _ in row]
return ['background-color: lightblue' if row['Count'] == max_count else '' for _ in row]
# Apply the highlight function to the entire DataFrame
industry_counts_with_total.style.apply(highlight_max_row, axis=1)
| industry_description | Count | Contribution % | |
|---|---|---|---|
| 0 | Cafes and Restaurants | 2295 | 75 |
| 1 | Takeaway Food Services | 395 | 13 |
| 2 | Pubs, Taverns and Bars | 231 | 8 |
| 3 | Accommodation | 62 | 2 |
| 4 | Other | 37 | 1 |
| 5 | Bakery Product Manufacturing (Non-factory based) | 23 | 1 |
| 6 | Other Specialised Food Retailing | 11 | 0 |
| 7 | Catering Services | 9 | 0 |
| 8 | Clubs (Hospitality) | 8 | 0 |
| 9 | Sports and Physical Recreation Venues, Grounds and Facilities Operation | 8 | 0 |
| 10 | Total | 3079 | 100 |
Cafes and Rrestaurants are the Highest contributing industry from all the industries in the city of Melbourne which is Around 75%.¶
Summary Statistics - Total Businees Data Set -Cafes, Restaurants, Bars, Taverns and Pubs with seating capacity.¶
# Basic information about the dataset
print(business_df.info())
print("\nSummary statistics:")
print(business_df.describe())
<class 'pandas.core.frame.DataFrame'>
Index: 2131 entries, 0 to 3078
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 clue_small_area 2131 non-null object
1 trading_name 2131 non-null object
2 industry_description 2131 non-null object
3 longitude 2131 non-null float64
4 latitude 2131 non-null float64
5 total_number_of_seats 2131 non-null int64
6 Area 2131 non-null int32
7 industry 2131 non-null int32
dtypes: float64(2), int32(2), int64(1), object(3)
memory usage: 133.2+ KB
None
Summary statistics:
longitude latitude total_number_of_seats Area \
count 2131.000000 2131.000000 2131.000000 2131.000000
mean 144.960695 -37.812818 401.617081 4.091506
std 0.010635 0.008592 779.495584 2.689150
min 144.904228 -37.849719 2.000000 0.000000
25% 144.955725 -37.817185 60.000000 4.000000
50% 144.962503 -37.813288 130.000000 4.000000
75% 144.967170 -37.809415 337.000000 4.000000
max 144.990561 -37.777494 4970.000000 12.000000
industry
count 2131.000000
mean 3.422806
std 2.705242
min 0.000000
25% 2.000000
50% 2.000000
75% 2.000000
max 9.000000
Data Visualization - Cafes, Restaurants, Bars, Taverns and Pubs with seating capacity.¶
Plotting the Composition of Seating Types
seating_counts = business_df['seating_type'].value_counts()
labels = seating_counts.index
sizes = seating_counts.values
colors = plt.get_cmap('Blues')(np.linspace(0.2, 0.7, len(labels)))
fig, ax = plt.subplots()
ax.pie(sizes, colors=colors, labels=labels, autopct='%1.1f%%', startangle=90,
wedgeprops={"linewidth": 1, "edgecolor": "white"})
ax.set(aspect='equal')
plt.title('Composition of Seating Types')
plt.show()
From the Total venus arond 65% are having Indoor Seats
Summed up total seating capacity per location and added Total Number of Seats (Indoor + Outdoor).
#Sum the Total seats (Indoor + Outdoor)
total_seats = business_df.groupby('longitude')['number_of_seats'].sum().reset_index()
# Rename the column
total_seats.rename(columns={'number_of_seats': 'total_number_of_seats'}, inplace=True)
# Merge the total seats into the original DataFrame
business_df = business_df.drop(columns=['number_of_seats']).merge(total_seats, on='longitude')
print(business_df.head())
business_df.shape
clue_small_area trading_name industry_description \
0 Docklands Yassas Cafes and Restaurants
1 Docklands Fukutontei Ramen Cafes and Restaurants
2 Docklands Yassas Cafes and Restaurants
3 Docklands Carl's Jr Docklands Cafes and Restaurants
4 Docklands Fukutontei Ramen Cafes and Restaurants
seating_type longitude latitude total_number_of_seats
0 Seats - Indoor 144.936589 -37.811859 484
1 Seats - Outdoor 144.936589 -37.811859 484
2 Seats - Outdoor 144.936589 -37.811859 484
3 Seats - Outdoor 144.936589 -37.811859 484
4 Seats - Indoor 144.936589 -37.811859 484
(3079, 7)
###Drop the Seating Type Column
business_df = business_df.drop(columns=['seating_type'])
###Remove Duplicates
business_df.drop_duplicates(keep='first', inplace=True)
business_df.duplicated().sum()
0
business_df.shape
(2131, 6)
Plotting the Distribution of Seating Capacity
pivot_df = business_df.copy()
business_df['total_number_of_seats'] = pivot_df['total_number_of_seats']
# Create bins for total number of seats
bins = list(range(0, 501, 50)) + [float('inf')]
labels = [f'{i}-{i+50}' for i in range(0, 500, 50)] + ['more than 500']
pivot_df['seats_bin'] = pd.cut(pivot_df['total_number_of_seats'], bins=bins, labels=labels, right=False)
bin_counts = pivot_df['seats_bin'].value_counts().sort_index()
# Calculate percentages
total_properties = bin_counts.sum()
percentages = (bin_counts / total_properties * 100).round(1)
plt.figure(figsize=(10, 7))
ax = bin_counts.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title('Distribution of Seating Capacity')
plt.xlabel('Total Seats Bin')
plt.ylabel('Number of Properties')
plt.xticks(rotation=45)
plt.tight_layout()
for p in ax.patches:
height = p.get_height()
bin_label = p.get_x() + p.get_width() / 2
bin_label_name = ax.get_xticklabels()[int(bin_label)].get_text()
percentage = percentages[bin_label_name]
ax.text(p.get_x() + p.get_width() / 2, height, f'{percentage:.0f}%',
ha='center', va='bottom', fontsize=10)
plt.show()
More than 40% of the Restaurants and Bars are having less than 100 seating capacity
###Plot the Distribution of Industies
industry_counts = business_df['industry_description'].value_counts()
# Calculate percentages
total_count = industry_counts.sum()
percentages = (industry_counts / total_count) * 100
fig, ax = plt.subplots(figsize=(10, 7))
bar_container = ax.bar(industry_counts.index, industry_counts, color='skyblue',edgecolor='black')
ax.set_xlabel('Industry Description', fontsize=10)
ax.set_ylabel('Number of Properties', fontsize=10)
ax.set_title('Industry wise Café, Restaurant, Bistro Seats, Bar, Tavern, Pub Distribution', fontsize=12)
ax.set_ylim(0, max(industry_counts) + max(industry_counts) * 0.1)
plt.xticks(rotation=90, fontsize=8)
plt.yticks(fontsize=8)
for p in ax.patches:
height = p.get_height()
bin_label = p.get_x() + p.get_width() / 2
bin_label_name = ax.get_xticklabels()[int(bin_label)].get_text()
percentage = percentages[bin_label_name]
ax.text(p.get_x() + p.get_width() / 2, height, f'{percentage:.0f}%',
ha='center', va='bottom', fontsize=10)
plt.show()
Cafes and Rrestaurants are the Highest contributing industry from all the industries in the city of Melbourne which is more than 72%.
Displaying Industry wise seating Capacity in a stack bar chart
# Group by seats_bin and industry_description, and count the occurrences
grouped = pivot_df.groupby(['seats_bin', 'industry_description']).size().unstack(fill_value=0)
# Calculate the percentage contribution of each industry description within each seats_bin
percentages = grouped.divide(grouped.sum(axis=1), axis=0) * 100
ax = grouped.plot(kind='bar', stacked=True, figsize=(10, 7), color=(sns.color_palette("Blues", n_colors=10)))
for i, bin in enumerate(grouped.index):
max_contrib_idx = percentages.loc[bin].idxmax()
max_contrib_val = percentages.loc[bin].max()
ax.annotate(f'{max_contrib_val:.0f}%\n{max_contrib_idx}',
xy=(i, grouped.loc[bin, :].cumsum()[max_contrib_idx] - grouped.loc[bin, max_contrib_idx]/2),
ha='center', va='bottom', fontsize=8, color='black', rotation = 90)
plt.title('Distribution of Seating Capacity by Industry Description')
plt.xlabel('Total Seats Bin')
plt.ylabel('Number of Properties')
plt.xticks(rotation=45)
plt.legend(title='Industry Description')
plt.tight_layout()
plt.show()
C:\Users\chath\AppData\Local\Temp\ipykernel_18544\919172455.py:4: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. grouped = pivot_df.groupby(['seats_bin', 'industry_description']).size().unstack(fill_value=0)
Cafes and Rrestaurants are the Highest contributing industry for all scale of locations with respect to seating capacity
## Area wise Industry Distribution
area_counts = business_df.groupby(['industry_description', 'clue_small_area']).size().reset_index(name='Count')
num_categories = area_counts['industry_description'].nunique()
nrows = (num_categories // 3) + 1
ncols = 3
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(15, 5 * nrows))
axes = axes.flatten()
X = area_counts.groupby('industry_description')
num = 0
for category, group in X:
df = pd.DataFrame(group)
df = df.sort_values(by='Count', ascending=False)
x_labels = df['clue_small_area'].values
y_values = df['Count'].values
ax = axes[num]
bars = ax.bar(x_labels, y_values, color='skyblue')
ax.set_title(f'Distribution of {category}')
ax.set_xlabel('Area')
ax.set_ylabel('Number of Properties')
ax.set_xticks(range(len(x_labels)))
ax.set_xticklabels(x_labels, rotation=90)
total = y_values.sum()
for bar, count in zip(bars, y_values):
height = bar.get_height()
percentage = f'{(count / total * 100):.0f}%'
ax.annotate(percentage, xy=(bar.get_x() + bar.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points", ha='center', va='top')
num += 1
for i in range(num, len(axes)):
axes[i].axis('off')
plt.tight_layout()
plt.show()
Other than Catering services all other industries are mainly concentrated around Melbourne (CDB) and most of the Catering services are at the Docklands area.
Displaying the Cafes and Bar locations on map using Folium
###Displaying the Cafes and Bar locations on map using Folium
map = folium.Map(location=[cafes_and_bars_df['latitude'].mean(), cafes_and_bars_df['longitude'].mean()], zoom_start=10)
# Add markers for each location
for index, row in cafes_and_bars_df.iterrows():
folium.Marker([row['latitude'], row['longitude']], popup=row['industry_description'],icon=folium.Icon("blue"),icon_size=(2, 2)).add_to(map)
legend_html = """
<div style="position: fixed;
bottom: 50px; left: 50px; width: 200px; height:50px;
border:2px grey; font-size:12px;
background-color:white;">
</div>
"""
# Add legend HTML to the map
map.get_root().html.add_child(folium.Element(legend_html))
title_html = """
<h3 style="text-align: center; margin: 10px 0;">Café, Restaurant, Bistro Seats, Bar, Tavern, Pub Locations</h3>
"""
map.get_root().html.add_child(folium.Element(title_html))
display(map)
Parking Bays.¶
I performed different data cleaning methods.
- Converted lastupdated column as datetime and Filterout past data
- Droped roadsegmentid, kerbsideid, lastupdated column.
- Renamed roadsegmentdescription column as Location
- Added new column as parking type.
- Checked for duplicate values and removed those.
parkingbays_df.head()
| roadsegmentid | kerbsideid | roadsegmentdescription | latitude | longitude | lastupdated | |
|---|---|---|---|---|---|---|
| 0 | 22730 | NaN | Park Street between Mason Street and Randall P... | -37.836245 | 144.982021 | 2022-08-31 |
| 1 | 22730 | NaN | Park Street between Mason Street and Randall P... | -37.835800 | 144.982115 | 2022-08-31 |
| 2 | 20013 | 5701 | Lonsdale Street between William Street and Kin... | -37.814238 | 144.955451 | 2023-10-02 |
| 3 | 20013 | 23444 | Lonsdale Street between William Street and Kin... | -37.814271 | 144.955334 | 2023-10-02 |
| 4 | 22268 | NaN | Clowes Street between Anderson Street and Wals... | -37.830568 | 144.984713 | 2022-08-31 |
parkingbays_df.shape
(19162, 6)
# Convert the 'lastupdated' column to datetime
parkingbays_df['lastupdated'] = pd.to_datetime(parkingbays_df['lastupdated'])
###Filtering Latest Data
parkingbays_df = parkingbays_df[parkingbays_df['lastupdated'].dt.year == 2023 ]
# Drop columns
parkingbays_df = parkingbays_df.drop(columns=['roadsegmentid','kerbsideid', 'lastupdated'])
# Rename roadsegmentdescription column as location
parkingbays_df = parkingbays_df.rename(columns={'roadsegmentdescription': 'location'})
#Added Parking Type column
parkingbays_df['parking_type'] = 'Parking Bay'
parkingbays_df.nunique()
location 522 latitude 7625 longitude 7616 parking_type 1 dtype: int64
parkingbays_df.drop_duplicates(keep='first', inplace=True)
parkingbays_df.duplicated().sum()
0
Parking Meters.¶
I performed different data cleaning methods.
- Droped barcode, creditcard, tapandgo, location, asset_id, meter_id columns.
- Rename location_description column as location.
- Added new column as parking type column
- Checked for duplicate values and removed duplicates
parkingmeters_df.head()
| barcode | creditcard | tapandgo | longitude | latitude | location | asset_id | meter_id | location_description | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | MPM1629529 | Yes | Yes | 144.953144 | -37.781968 | -37.7819682, 144.9531444 | 1629529 | ZOOT8 | Royal Park |
| 1 | MPM1629539 | Yes | Yes | 144.950263 | -37.787154 | -37.7871538, 144.9502628 | 1629539 | ZOOT23A | Royal Park |
| 2 | MPM1629546 | Yes | Yes | 144.952908 | -37.781829 | -37.7818288, 144.9529081 | 1629546 | ZOOT8A | Royal Park |
| 3 | MPM1629570 | Yes | Yes | 144.950592 | -37.787432 | -37.7874323, 144.9505921 | 1629570 | ZOOT23 | Royal Park |
| 4 | MPM1629571 | Yes | Yes | 144.950776 | -37.781745 | -37.7817446, 144.9507757 | 1629571 | ZOOT2 | Royal Park, approximately 73m NW of Leopard Lo... |
parkingmeters_df.shape
(1258, 9)
# Drop columns
parkingmeters_df = parkingmeters_df.drop(columns=['barcode','creditcard', 'tapandgo','location','asset_id', 'meter_id'])
# Rename Location_description'
parkingmeters_df = parkingmeters_df.rename(columns={'location_description': 'location'})
#Added parking Type column
parkingmeters_df['parking_type'] = 'Parking Meter'
parkingmeters_df.nunique()
longitude 1257 latitude 1257 location 1196 parking_type 1 dtype: int64
###Remove duplicates
parkingmeters_df.drop_duplicates(keep='first', inplace=True)
parkingmeters_df.duplicated().sum()
0
Create Total Parking Data Set.¶
I created a single data fream by merging both parking meters and parking bays data sets.
- Checked and handled Null values for merged data set.
- Checked for the duplicates values and removed duplicates
parking_df = pd.concat([parkingmeters_df, parkingbays_df], ignore_index=True)
parking_df.head()
| longitude | latitude | location | parking_type | |
|---|---|---|---|---|
| 0 | 144.953144 | -37.781968 | Royal Park | Parking Meter |
| 1 | 144.950263 | -37.787154 | Royal Park | Parking Meter |
| 2 | 144.952908 | -37.781829 | Royal Park | Parking Meter |
| 3 | 144.950592 | -37.787432 | Royal Park | Parking Meter |
| 4 | 144.950776 | -37.781745 | Royal Park, approximately 73m NW of Leopard Lo... | Parking Meter |
parking_df.nunique()
longitude 8839 latitude 8863 location 1714 parking_type 2 dtype: int64
##Check Null Values
parking_df.isna().sum() ##No any Null Values
longitude 0 latitude 0 location 0 parking_type 0 dtype: int64
Data Visualization- Total Parking Data Set.¶
Plotting Pie chart to visualise the composition of parking Types
parking_counts = parking_df['parking_type'].value_counts()
labels = [f'{label} ({count})' for label, count in zip(parking_counts.index, parking_counts.values)]
sizes = parking_counts.values
colors = plt.get_cmap('Blues')(np.linspace(0.2, 0.7, len(labels)))
fig, ax = plt.subplots()
ax.pie(sizes, colors=colors, labels=labels, autopct='%1.1f%%', startangle=90,
wedgeprops={"linewidth": 1, "edgecolor": "white"})
ax.set(aspect='equal')
plt.title('Composition of Parking Type')
plt.show()
Displaying Parking Bays and Meters locations on map using Folium
map = folium.Map(location=[parking_df['latitude'].mean(), parking_df['longitude'].mean()], zoom_start=10)
# Add markers for each location
for index, row in parking_df.iterrows():
folium.Marker([row['latitude'], row['longitude']], popup=row['location'],icon=folium.Icon("blue"),icon_size=(2, 2)).add_to(map)
legend_html = """
<div style="position: fixed;
bottom: 50px; left: 50px; width: 200px; height:50px;
border:2px grey; font-size:12px;
background-color:white;">
</div>
"""
# Add legend HTML to the map
map.get_root().html.add_child(folium.Element(legend_html))
title_html = """
<h3 style="text-align: center; margin: 10px 0;">Parking Locations</h3>
"""
map.get_root().html.add_child(folium.Element(title_html))
display(map)